Data Mining within a Regression Framework

نویسنده

  • Richard A. Berk
چکیده

Regression analysis can imply a broader range of techniques that ordinarily appreciated. Statisticians commonly define regression so that the goal is to understand “as far as possible with the available data how the the conditional distribution of some response y varies across subpopulations determined by the possible values of the predictor or predictors” ( Cook and Weisberg, 1999: 27). For example, if there is a single categorical predictor such as male or female, a legitimate regression analysis has been undertaken if one compares two income histograms, one for men and one for women. Or, one might compare summary statistics from the two income distributions: the mean incomes, the median incomes, the two standard deviations of income, and so on. One might also compare the shapes of the two distributions with a Q-Q plot. There is no requirement in regression analysis for there to be a “model” by which the data were supposed to be generated. There is no need to address cause and effect. And there is no need to undertake statistical tests or construct confidence intervals. The definition of a regression analysis can be met by pure description alone. Construction of a “model,” often coupled with causal and statistical inference, are supplements to a regression analysis, not a necessary component (Berk, 2003). Given such a definition of regression analysis, a wide variety of techniques and approaches can be applied. In this chapter I will consider a range of procedures under the broad rubric of data mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Integrated DEA and Data Mining Approach for Performance Assessment

This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...

متن کامل

A New Approach for Obtaining Settling Velocity in a Thickener Using Statistical Regression: A Case Study

In this research work, the parameters affecting the settling velocity within the thickeners were studied by introducing an equivalent shape factor. Several thickener feed samples of different densities including copper, lead and zinc, and coal were prepared. The settling tests were performed on the samples, and the corresponding settling curves were plotted. Using the linear regression analysis...

متن کامل

Development of a framework to evaluate service-oriented architecture governance using COBIT approach

Nowadays organizations require an effective governance framework for their service-oriented architecture (SOA) in order to enable them to use a framework to evaluate their current state governance and determine the governance requirements, and then to offer a suitable model for their governance. Various frameworks have been developed to evaluate the SOA governance. In this paper, a brief introd...

متن کامل

Context-based Distributed Regression in Virtual Organizations

The characteristics of virtual organizations present significant challenges to both distributed data mining methods within a metalearning framework and statistical multi-level models. Using hierarchical models, this paper explicitly address the context heterogeneity existing across the partners of virtual organizations. Two new approaches of context-based distributed data mining are analyzed an...

متن کامل

Contour regression: A distribution-regularized regression framework for climate modeling

Regression methods are commonly used to learn the mapping from a set of predictor variables to a continuousvalued target variable such that their prediction errors are minimized. However, minimizing the errors alone may not be sufficient for some applications, such as climate modeling, which require the overall predicted distribution to resemble the actual observed distribution. On the other ha...

متن کامل

Customer Behavior Mining Framework (CBMF) using clustering and classification techniques

The present study proposes a Customer Behavior Mining Framework on the basis of data mining techniques in a telecom company. This framework takes into account the customers’ behavior patterns and predicts the way they may act in the future. Firstly, clustering technique is used to implement portfolio analysis and previous customers are divided based on socio-demographic features using k</em...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005